Skip to content

CASSANALYTICS-151: Cannot read start offset from BTI with big partitions#199

Open
lukasz-antoniak wants to merge 2 commits intoapache:trunkfrom
lukasz-antoniak:CASSANALYTICS-151
Open

CASSANALYTICS-151: Cannot read start offset from BTI with big partitions#199
lukasz-antoniak wants to merge 2 commits intoapache:trunkfrom
lukasz-antoniak:CASSANALYTICS-151

Conversation

@lukasz-antoniak
Copy link
Copy Markdown
Member

Copy link
Copy Markdown
Member

@michaelsembwever michaelsembwever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 tested, fixes the issue

try
{
withPartitionIndex(ssTable, descriptor, metadata, true, false, (dataFileHandle, partitionFileHandle, rowFileHandle, partitionIndex) -> {
withPartitionIndex(ssTable, descriptor, metadata, true, true, (dataFileHandle, partitionFileHandle, rowFileHandle, partitionIndex) -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, I think loadDataFile and loadRowsIndex are always true. Can we simplify the method signature?

In Cassandra's implementation, row index component is always opened.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you. Code updated.

{
LOGGER.error("Missing key key={} token={} partitioner={}",
key,
toToken(partitioner, index),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be toToken(partitioner, i), since i is the partition index. Please rename i too.

Meanwhile, the variable index can be removed.

{
LOGGER.error("Key read by more than 1 Spark partition key={} token={} partitioner={}",
key,
toToken(partitioner, index),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, it should be toToken(partitioner, i)

partitioner.name());
}
else if (count > 1)
for (int j = 0; j < counts[i].length; j++)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename j to rowIndexInPartition?

MutableInt skippedPartitions = new MutableInt(0);
MutableLong skippedDataOffsets = new MutableLong(0);
int[] counts = new int[numKeys];
int[][] counts = new int[numPartitions][numRowsPerPartition];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename counts to partitions

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this two-dimension array stores really the count of views for each partition and row, so maybe we should leave it as counts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the change in this file (for 4.0) necessary? The bug fixed is only in the BTI format code path.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My motivation was to keep both classes in-sync unless changes cannot be applied. I have rolled back the change in four-zero bridge now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants